172
Applications in Computer Vision
FIGURE 6.12
Convergence Faster-RCNN with ResNet-18 backbone (left) and SSD with VGG-16 backbone
(right) based on different binarizations training on VOC trainval2007 and trainval2012.
FIGURE 6.13
The input images and the saliency maps follow [79]. The images are randomly selected from
VOC test2007. Each row includes: (a) input images, saliency maps of (b) Faster-RCNN
with ResNet-101 backbone (Res101), (c) Faster-RCNN with ResNet-18 backbone (Res18),
(d) 1-bit Faster-RCNN with ResNet-18 backbone (BiRes18), respectively.
that knowledge distillation (KD) methods such as [235] are effective for distilling real-valued
Faster-RCNNs, only when their teacher model and their student counterpart share small
information discrepancy on proposals, as shown in Fig. 6.13 (b) and (c). This phenomenon
does not happen for 1-bit Faster-RCNN, as shown in Fig. 6.13 (b) and (d). This might
explain why existing KD methods are less effective in 1-bit detectors. A statistic on the
COCO and PASCAL VOC datasets in Fig. 6.14 shows that the discrepancy between the